Reducing Semantic Drift with Bagging and Distributional Similarity
نویسندگان
چکیده
Iterative bootstrapping algorithms are typically compared using a single set of handpicked seeds. However, we demonstrate that performance varies greatly depending on these seeds, and favourable seeds for one algorithm can perform very poorly with others, making comparisons unreliable. We exploit this wide variation with bagging, sampling from automatically extracted seeds to reduce semantic drift. However, semantic drift still occurs in later iterations. We propose an integrated distributional similarity filter to identify and censor potential semantic drifts, ensuring over 10% higher precision when extracting large semantic lexicons.
منابع مشابه
Semi-supervised Semantic Pattern Discovery with Guidance from Unsupervised Pattern Clusters
We present a simple algorithm for clustering semantic patterns based on distributional similarity and use cluster memberships to guide semi-supervised pattern discovery. We apply this approach to the task of relation extraction. The evaluation results demonstrate that our novel bootstrapping procedure significantly outperforms a standard bootstrapping. Most importantly, our algorithm can effect...
متن کاملWhat can distributional semantic models tell us about part-of relations?
The term Distributional semantic models (DSMs) refers to a family of unsupervised corpus-based approaches to semantic similarity computation. These models rely on the distributional hypothesis (Harris, 1954), which states that semantically related words tend to share many of their contexts. So, by collecting information about the contexts in which words are used in a corpus, DSMs are able to me...
متن کاملOn the use of distributional models of semantic space to investigate human cognition
Huettig et al. (2006) demonstrated that corpus-based measures of word semantics predict language-mediated eyemovements in the visual world. These data, in conjunction with the evidence from other tasks, is strong evidence for the psychological validity of corpusbased semantic similarity measures. But can corpus-based distributional models be more than just good measures of semantic similarity? ...
متن کاملComputational models of semantic similarity 1 Running head: Computational models of semantic similarity Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation
Recent developments in distributional semantics (Mikolov et al., 2013) include a new class of prediction-based models that are trained on a text corpus and that measure semantic similarity between words. We discuss the relevance of these models for psycholinguistic theories and compare them to more traditional distributional semantic models. We compare the models' performances on a large datase...
متن کاملThe Distributional Similarity Of Sub-Parses
This work explores computing distributional similarity between sub-parses, i.e., fragments of a parse tree, as an extension to general lexical distributional similarity techniques. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propose using distributional similarity between subparses to estimate the semantic similarity of phrases. Suc...
متن کامل